AITopics | code search model

Collaborating Authors

code search model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Code Search Debiasing:Improve Search Results beyond Overall Ranking Performance

Zhang, Sheng, Li, Hui, Wang, Yanlin, Wei, Zhao, Xiu, Yong, Wang, Juhong, Ji, Rongong

arXiv.org Artificial IntelligenceNov-24-2023

Code search engine is an essential tool in software development. Many code search methods have sprung up, focusing on the overall ranking performance of code search. In this paper, we study code search from another perspective by analyzing the bias of code search models. Biased code search engines provide poor user experience, even though they show promising overall performance. Due to different development conventions (e.g., prefer long queries or abbreviations), some programmers will find the engine useful, while others may find it hard to get desirable search results. To mitigate biases, we develop a general debiasing framework that employs reranking to calibrate search results. It can be easily plugged into existing engines and handle new code search biases discovered in the future. Experiments show that our framework can effectively reduce biases. Meanwhile, the overall ranking performance of code search gets improved after debiasing.

code snippet, query, snippet, (15 more...)

arXiv.org Artificial Intelligence

2311.14901

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Fujian Province > Xiamen (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Backdooring Neural Code Search

Sun, Weisong, Chen, Yuchen, Tao, Guanhong, Fang, Chunrong, Zhang, Xiangyu, Zhang, Quanjun, Luo, Bin

arXiv.org Artificial IntelligenceJun-12-2023

Reusing off-the-shelf code snippets from online repositories is a common practice, which significantly enhances the productivity of software developers. To find desired code snippets, developers resort to code search engines through natural language queries. Neural code search models are hence behind many such engines. These models are based on deep learning and gain substantial attention due to their impressive performance. However, the security aspect of these models is rarely studied. Particularly, an adversary can inject a backdoor in neural code search models, which return buggy or even vulnerable code with security/privacy issues. This may impact the downstream software (e.g., stock trading systems and autonomous driving) and cause financial loss and/or life-threatening incidents. In this paper, we demonstrate such attacks are feasible and can be quite stealthy. By simply modifying one variable/function name, the attacker can make buggy/vulnerable code rank in the top 11%. Our attack BADCODE features a special trigger generation and injection procedure, making the attack more effective and stealthy. The evaluation is conducted on two neural code search models and the results show our attack outperforms baselines by 60%. Our user study demonstrates that our attack is more stealthy than the baseline by two times based on the F1 score.

code snippet, information retrieval, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.17506

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
(14 more...)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

COSEA: Convolutional Code Search with Layer-wise Attention

Wang, Hao, Zhang, Jia, Xia, Yingce, Bian, Jiang, Zhang, Chao, Liu, Tie-Yan

arXiv.org Artificial IntelligenceOct-19-2020

Semantic code search, which aims to retrieve code snippets relevant to a given natural language query, has attracted many research efforts with the purpose of accelerating software development. The huge amount of online publicly available code repositories has prompted the employment of deep learning techniques to build state-of-the-art code search models. Particularly, they leverage deep neural networks to embed codes and queries into a unified semantic vector space and then use the similarity between code's and query's vectors to approximate the semantic correlation between code and the query. However, most existing studies overlook the code's intrinsic structural logic, which indeed contains a wealth of semantic information, and fails to capture intrinsic features of codes. In this paper, we propose a new deep learning architecture, COSEA, which leverages convolutional neural networks with layer-wise attention to capture the valuable code's intrinsic structural logic. To further increase the learning efficiency of COSEA, we propose a variant of contrastive loss for training the code search model, where the ground-truth code should be distinguished from the most similar negative sample. We have implemented a prototype of COSEA. Extensive experiments over existing public datasets of Python and SQL have demonstrated that COSEA can achieve significant improvements over state-of-the-art methods on code search tasks.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2010.0952

Genre: Research Report > Promising Solution (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Releasing a new benchmark and data set for evaluating neural code search models

#artificialintelligenceNov-25-2019, 19:47:04 GMT

A new benchmark to evaluate code search techniques. The benchmark includes the largest evaluation data set currently available for Java, consisting of a natural language query and code snippet pairs. This data set comprises 287 Stack Overflow question-and-answer pairs from the Stack Exchange Data Dump. Also included is a search corpus that contains more than 24,000 of the most popular Android repositories on GitHub (ranked by the number of stars) and is indexed using the more than 4.7 million method bodies parsed from these repositories. A score sheet on the evaluation data set, using two models from our recent work, is also included.

code search model, evaluation data, search model, (11 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.55)

Add feedback